Latent Ambiguity in Latent Semantic Analysis?
نویسندگان
چکیده
Latent Semantic Analyis (LSA) consists in the use of SVD-based dimensionality-reduction to reduce the high dimensionality of vector representations of documents, where the dimensions of the vectors correspond simply to word counts in the documents. We show that that there are two contending, inequivalent, formulations of LSA. The distinction between the two is not generally noted and while some work adheres to one formulation, other work adheres to the other formulation. We show that on both a tiny contrived data-set and also on a more substantial word-sense discovery data-set that the empirical outcomes achieved with LSA vary according to which formulation is chosen.
منابع مشابه
Query expansion based on relevance feedback and latent semantic analysis
Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...
متن کاملWeakly Supervised Object Localization with Latent Category Learning
Localizing objects in cluttered backgrounds is a challenging task in weakly supervised localization. Due to large object variations in cluttered images, objects have large ambiguity with backgrounds. However, backgrounds contain useful latent information, e.g., the sky for aeroplanes. If we can learn this latent information, object-background ambiguity can be reduced to suppress the background....
متن کاملDistributional Semantics Approach to Thai Word Sense Disambiguation
Word sense disambiguation is one of the most important open problems in natural language processing applications such as information retrieval and machine translation. Many approach strategies can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledgebased, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy...
متن کاملAn application of Measurement error evaluation using latent class analysis
Latent class analysis (LCA) is a method of evaluating non sampling errors, especially measurement error in categorical data. Biemer (2011) introduced four latent class modeling approaches: probability model parameterization, log linear model, modified path model, and graphical model using path diagrams. These models are interchangeable. Latent class probability models express l...
متن کاملlsemantica: A Stata Command for Text Similarity based on Latent Semantic Analysis
The lsemantica command, presented in this paper, implements Latent Semantic Analysis in Stata. Latent Semantic Analysis is a machine learning algorithm for word and text similarity comparison. Latent Semantic Analysis uses Truncated Singular Value Decomposition to derive the hidden semantic relationships between words and texts. lsemantica provides a simple command for Latent Semantic Analysis ...
متن کامل